Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation

نویسندگان

  • Elias Azarov
  • Maxim Vashkevich
  • Alexander A. Petrovsky
چکیده

This paper introduces a framework for parametric speech modeling that can be used in various speech applications such as text-to-speech synthesis, voice conversion etc. In order to reduce impact of pitch variations the harmonic analysis is done in the warped time scale that is aligned with instantaneous pitch values. It is assumed that each harmonic has its own periodic excitation source that evolves in time and can be modeled as a sum of several sinusoidal components with close frequencies. The parameters of the excitation components are estimated using a modified instantaneous Prony's method. The proposed analysis/synthesis technique is compared with TANDEM-STRAIGHT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time and non-real-time voice conversion systems with web interfaces

Two speech processing systems have been developed for realtime and non-real-time voice conversion. Using the real-time processing the user can apply conversion during voice over IP (VoIP) calls imitating identity of a specified target speaker. Non-real-time processing system converts prerecorded audio books read by a professional reader imitating voice of the user. Both systems require some spe...

متن کامل

Harmonic alternatives to sine-wave speech

Sine-wave speech (SWS) is a three-tone replica of speech, conventionally created by matching each constituent sinusoid in amplitude and frequency with the corresponding vocal tract resonance (formants). We propose an alternative technique of starting from a high-quality multicomponent sinusoidal representation, then decimating this model to only three components per frame. In contrast to SWS, t...

متن کامل

This is a placeholder. Final title will be filled later

Sine-wave speech (SWS) is a three-tone replica of speech, conventionally created by matching each constituent sinusoid in amplitude and frequency with the corresponding vocal tract resonance (formant). We propose an alternative technique where we take a high-quality multicomponent sinusoidal representation and decimate this model so that there are only three components per frame. In contrast to...

متن کامل

Wideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis

Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. However, developing a high fidelity full band sinusoidal model that preserves its high quality on speech transformation still remains an open research problem. Such a system can be extremely useful for high quality speech synthesis. In this paper we present an enhanced...

متن کامل

AM-FM estimation for speech based on a time-varying sinusoidal model

In this paper we present a method based on a time-varying sinusoidal model for a robust and accurate estimation of amplitude and frequency modulations (AM-FM) in speech. The suggested approach has two main steps. First, speech is modeled as a sinusoidal model with time-varying amplitudes. Specifically, the model makes use of a first order time polynomial with complex coefficients for capturing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013